You are viewing the RapidMiner Studio documentation for version 10.2 - Check here for latest version
Stem Tokens using ExampleSet
(Operator Toolbox)
Synopsis
Replaces terms by pattern matching rules. This operator uses an ExampleSet to stem a list of words inside a ''Process Documents'' operator.Description
This operator can be used in your ''Process Documents'' operator and allows to provide a custom list of tokens to be filtered out. It is like the Stem (Dictionary) operator, except the input here is an ExampleSet rather than a file.
It reduces terms to a base form using an external ExampleSet with replacement rules. The ExampleSet must contain a rule per line: targetExpression:pattern1 pattern2 ... where targetExpression is the term to which the input terms are reduced, if it matches any of the patterns. patternX is a simple string or a regular expression. A simple example would be a mapping like: weekday : .*day Please keep in mind, that very short words are filtered out in the default setting of the TextInput operators.
Input
doc
The documents input port.
exa (Data Table)
The ExampleSet with the tokens.
Output
doc
The resulting document.
Parameters
- attribute The name of the attribute that should be used for stemming. Range:
Tutorial Processes
Stem weekdays from a document
In this example we are replacing name of weekdays with the word ''weekday''.